Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are a type of neural network designed for processing sequences by leveraging hidden states to capture temporal information. They are particularly well-suited for tasks like language modeling, where the goal is to predict the next token based on the historical sequence of previous tokens.
Basics of RNNs
-
Latent Variable Models: RNNs utilize latent variable models to approximate the probability of a token given all previous tokens . This is represented mathematically as:
where denotes the hidden state at time .
-
Hidden State Calculation: The hidden state is updated at each timestep using the current input and the previous hidden state via a function , as shown:
This function, often nonlinear, allows the RNN to compactly represent the history of observed data up to the current timestep.
-
Difference from Hidden Layers: Hidden states in RNNs should not be confused with hidden layers in other types of neural networks. Hidden states serve as inputs to each step of the RNN, reflecting the sequence's memory up to that point.
Neural Networks without Hidden States
For a simpler neural network model like the Multi-Layer Perceptron (MLP) with a single hidden layer, the computation does not involve any temporal dynamics:
where is an activation function, and , are the weight and bias parameters respectively.
Recurrent Neural Networks with Hidden States
In contrast to the non-recurrent model, RNNs maintain a hidden state across timesteps, updating it recurrently using both the current input and the previous hidden state:
This recurrent update mechanism allows RNNs to remember information across many timesteps, making them ideal for tasks like time series forecasting and language modeling.
RNN-Based Character-Level Language Models
An RNN can be used to model language at the character level, where the network predicts the next character based on the past sequence of characters. This approach involves:
- Shifting the sequence to align inputs and labels for training (e.g., input: "machine", label: "achine").
- Using softmax and cross-entropy loss to train the model on predicting the next character in the sequence.
Example:
RNN in Python
Below is a basic example using PyTorch:
import torch
import torch.nn as nn
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, _ = self.rnn(x)
out = self.fc(out[:, -1, :])
return out
# Example usage
rnn = SimpleRNN(input_size=10, hidden_size=20, output_size=1)
input = torch.randn(5, 10, 10) # (batch_size, sequence_length, input_size)
output = rnn(input)
print(output)
This Python code defines a simple RNN module using PyTorch's nn.RNN layer. It processes input sequences and returns output using a fully connected layer after the last sequence element has been processed.
Customize RNN in Pytorch
Here is a comprehensive example from the PyTorch tutorial on building a character-level RNN for classifying names into their language of origin. This example includes the process of preparing the data, building the RNN model, and the training loop.
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
import time
import math
# Helper function to convert Unicode string to plain ASCII
def unicodeToAscii(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
# Read a file and split into lines
def readLines(filename):
lines = open(filename, encoding='utf-8').read().strip().split('\n')
return [unicodeToAscii(line) for line in lines]
# RNN model
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
# Training the model
def train(category_tensor, line_tensor):
hidden = rnn.initHidden()
rnn.zero_grad()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
for p in rnn.parameters():
p.data.add_(p.grad.data, alpha=-learning_rate)
return output, loss.item()
# Training loop
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%d m %d s' % (m, s)
n_iters = 100000
print_every = 5000
plot_every = 1000
all_losses = []
current_loss = 0
start = time.time()
for iter in range(1, n_iters + 1):
category, line, category_tensor, line_tensor = randomTrainingExample()
output, loss = train(category_tensor, line_tensor)
current_loss += loss
if iter % print_every == 0:
guess, guess_i = categoryFromOutput(output)
correct = '✓' if guess == category else '✗ (%s)' % category
print('%d %d%% (%s) %.4f %s / %s %s' % (
iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))
if iter % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0